perm filename MINIMA.OLD[S79,JMC] blob sn#554851 filedate 1979-09-17 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00006 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.GROUP SKIP 20
C00003 00003	.cb CIRCUMSCRIPTION - A FORM OF NON-MONOTONIC REASONING
C00020 00004
C00021 00005		Here is a formalization of the blocks world in which circumscription
C00025 00006	.bb Co-ordinate systems and propositional calculus circumscription.
C00034 ENDMK
C⊗;
.GROUP SKIP 20
.cb CIRCUMSCRIPTION - A FORM OF NON-MONOTONIC REASONING


Abstract: Humans and intelligent computer programs must often jump to
the conclusion that the objects they can determine to have certain
properties or relations are the only objects that do.
%2Circumscription%1 is
one formalization of this kind of conjectural reasoning.

.skip to column 1
.cb CIRCUMSCRIPTION - A FORM OF NON-MONOTONIC REASONING


#. Introduction - the need for non-monotonic reasoning

	Humans and intelligent computer programs must often jump to the
conclusion that the objects they know about in a certain class are the
only objects that have to be considered in solving a certain problem.
%2Circumscription induction%1 is one formalization of this kind of
approximate reasoning.

	While it is often convenient to represent information by sentences
in a mathematical logical system, such as a first order language appropriate
to a given problem domain and to use the logical rules of inference
to reason towards a solution of a problem, the rules of inference of
mathematical logic have always been monotonic.  
	Humans and many computer programs designed to behave intelligently
represent much information as sentences in some language and solve
problems by inferring further sentences from previous sentences.
An action is performed when it is concluded that the action is
appropriate.  The most straightforward approach is to represent
the information as sentences in a %2first order language%1 appropriate
to the problem domain and to reason by applying the rules of inference
of first order logic.  Other systems of mathematical logic may be
used, but this doesn't make much difference, because any other theory
can be modelled in a first order theory.

This paper formalizes a form of non-monotonic reasoning called
%2circumscription%1 that we believe is used by humans and required for
computer programs that solve problems by intelligent reasoning from
premisses.

	Human intelligence has and artificial intelligence requires
a form of reasoning different from that included in mathematical
logic.  All the systems studied in mathematical logic have the
following %2monotonicity property%1.  If a sentence ⊗p follows
from a collection ⊗A of sentences and %2A_⊂_B%1, the ⊗p follows
from ⊗B.  In the notation of proof theory: if %2A_qi_p%1 and
%2A_⊂_B%1, then %2B_qi_p%1.
Monotonicity is a consequence of the notion of a proof as a
sequence of sentences each of which is a either a premiss or
follows from a subset of the sentences occurring earlier in the
proof by one of the rules of inference.  Rules of inference
in logic are themselves %2monotonic%1 in that applying a rule
requires that its premiss sentences be present earlier in the
proof but makes no requirement that certain sentences not be
present.

	We envisage using minimal inference in AI as follows:
Suppose a system has a certain collection of facts.  It may or
may not have all the relevant facts, but it is reasonable for
it to conjecture that does and examine the consequences of the
conjecture.  The minimal completion axiom for a set of statements
is an expression of the hypothesis that the only objects that
exist are those which must exist on the basis of these statements.
In order to make this workable in practice, it will be necessary
to apply minimal completion to subsets of the facts; different
subsets will give rise to different conjectures.  Moreover, we
will need to apply the relativization to only some of the sorts
of individuals, so that we can remain noncommittal about whether
there are more of certain sorts than our axioms prove to exist.

	An example that has been much
studied in AI and which contains enough detail to illustrate
various issues is the %2missionaries and cannibals%1 problem, stated
as follows:

	%2"Three missionaries and three cannibals come to a river.  A
rowboat that seats two is available.  However, if the cannibals ever
outnumber the missionaries on either bank of the river, the outnumbered
missionaries will be eaten.  How shall they cross the river?"%1.

	Amarel (1971) considered various representations of the problem
and discussed criteria whereby the following representation is preferred
for purposes of AI, because it leads to the smallest state space that must
be explored to find the solution.  We represent a state by the triplet
consisting of the numbers of missionaries, cannibals and boats on the
initial bank of the river. The initial configuration is 331, the desired
final configuration is 000, and one solution is given
by the sequence (331,220,321,300,311,110,221,020,031,010,021,000)
of states.

	We are not presently concerned with the heuristics of the problem
but rather with the correctness of the reasoning that goes from the
English statement of the problem to Amarel's state space representation.
A generally intelligent computer program should be able to
carry out this reasoning.  Of
course there are the well known difficulties in making computers
understand English, but suppose the English sentences describing the
problem have already been rather directly translated into first order
logic.  The correctness of Amarel's representation is not an ordinary
logical consequence of these sentences for two further reasons.

	First, nothing has been stated about the properties of boats or
even the fact that rowing across the river doesn't change the numbers of
missionaries or cannibals or the capacity of the boat.  These are a
consequence of common sense knowledge, so let us imagine that common sense
knowledge, or at least the relevant part of it, is also expressed in first
order logic.

	The second reason we can't ⊗deduce the propriety of Amarel's
representation is deeper.  Imagine giving someone the problem, and after
he puzzles for a while, he suggests going upstream half a mile and
crossing on a bridge.  "What bridge", you say.  "No bridge is mentioned in
the statement of the problem."  And this dunce replies, "Well, it isn't
stated that there isn't a bridge".  You look at the English and even at
the translation of the English into first order logic, and you must admit
that it isn't stated that there is no bridge.  So you modify the problem
to exclude bridges
and pose it again, and the dunce proposes a helicopter, and after you
exclude that he proposes a winged horse or that the others hang onto the
outside of the boat while two row.  You now see that while a dunce, he is
an inventive dunce, and you despair of getting him to accept the problem
in the proper puzzler's spirit and tell him the solution.  You are further
annoyed when he attacks your solution on the grounds that the boat might
have a leak or
not have oars.  After you rectify that omission, he suggests that a sea
monster may swim up the river and may swallow the boat.  Again you are
frustrated, and you look for a mode of reasoning that will settle his hash
once and for all.

	Minimal models and minimal inference may give the solution.  In a
minimal model of the first order logic statement of the problem, there is
no bridge or helicopter.  "Ah", you say, "but there aren't any oars
either".  No, we get out of that as follows:  It is a part of common
knowledge that a boat can be used to cross a river %2unless there is
something wrong with it or something else prevents using it%1, and in the
minimal model of the facts there doesn't exist something wrong with the
boat, and there isn't anything else to prevent its use.

	The non-monotonic character of minimal reasoning is just what we
want here.  The statement, %2"There is a bridge a mile upstream."%1
doesn't contradict the text of the problem, but its addition invalidates
the Amarel representation.

	In the usual sort of puzzle, there can be no additional source
of information beyond the puzzle and common sense knowledge.  Therefore,
the convention can be explicated as taking the minimal completion
of the puzzle statement and a certain part of common sense knowledge.
If one really were sitting by a river bank and these six people came
by and posed their problem, one wouldn't immediately take the minimal
completion for granted, but one would consider it as a hypothesis.

	At first I considered this solution too ad hoc to be
acceptable, but I am now convinced that it is a correct explication of the
%2normal%1 situation, e.g. unless something abnormal existr
`an object can
be used to carry out its normal function.

	Some have suggested that the difficulties might be avoided by
introducing probabilities.  They suggest that the existence of a bridge is
improbable.  Well the whole situation involving cannibals with the
postulated properties is so improbable that it is hard to take seriously
the conditional probability of a bridge given the hypotheses.  Moreover,
we mentally propose to ourselves the normal non-bridge non-sea monster
interpretation ⊗before considering these extraneous possibilities, let
alone their probabilities, i.e. we usually don't even introduce the sample
space in which these possibilities are assigned whatever probabilities one
might consider them to have.  Therefore, regardless of our knowledge of
probabilities, we need a way of formulating the normal situation from the
statement of the facts, and the minimal completion axioms seems to be a method.

	Of course, we haven't given an expression of common sense knowledge
in a form that says a boat can be used to cross rivers unless there is
something wrong with it.  In particular, we haven't introduced into
our ⊗ontology (the things that exist) a category that includes
%2something wrong with a boat%1 or a category that includes
%2something that may prevent its use%1.
Incidentally, once we have decided to admit %2something wrong with the boat%1,
we are inclined to admit a %2lack of oars%1 as such a something and to ask
questions like, %2"Is a lack of oars all that is wrong with the boat?%1".

	I imagine that many philosophers
and scientists would be reluctant to introduce such ⊗things, but the
fact that ordinary language allows %2"something wrong with the boat"%1
suggests that we shouldn't be hasty in excluding it.  Making a suitable
formalism may be technically difficult and philosophically problematical,
but we must try.

	In compensation, we may get an increased ability to understand
natural language, because if natural language admits something like
minimal reasoning, it is understandable that we will have difficulty
in translating it into logical formalisms that don't.

	The possibility of forming minimal completions of various
parts of our knowledge introduces a kind of fuzziness that I believe
may prove more fruitful than that explicated by fuzzy set theory.

	When humans and many recent computer programs reason from a
collection of facts, they often reach conclusions that depend on the
absence of certain information.  For example, in the absence of
information to the contrary, they may make default assumptions.  Thus if
it is stated that a boat is available at one bank of a river, it is often
concluded at least tentatively that the boat can be used to cross the
river.  For a reason shortly to be explained, we call such reasoning
%2non-monotonic%1.

	Here is a formalization of the blocks world in which circumscription
is used to keep open the set of possible circumstances that may
prevent a block ⊗x from being moved onto a block ⊗y.  We use the
situation calculus of (McCarthy and Hayes 1969).

	A situation ⊗s is a snapshot of the world at an instant of
time.  We write ⊗on(x,y,s) to say that block ⊗x is on block ⊗y in
situation ⊗s.  %2s'_=_result(a,s)%1 is the new situation ⊗s' that results
when action ⊗a is performed in situation ⊗s.  One such action is
⊗move(x,y), an attempt to move block ⊗x onto block ⊗y.
  The effect
of this action is given by

!!b1:	%2∀x y s.(∀r.¬prevmove(r,x,y,s) ⊃ on(x,y,result(move(x,y),s)))%1

which asserts that ⊗x is on ⊗y after the move if there is no
reason ⊗r that prevents the move.

	Suppose that one reason why a block ⊗x might not be movable
is if it were too heavy.  We express this by

!!b2:	%2∀x y r s.(tooheavy x ⊃ prevmove(heaviness x,x,y,s))%1,

using both the predicate ⊗tooheavy and the abstract entity
⊗heaviness_x which can be a reason.  Here are some axioms about
reasons:

!!b3:	%2∀x.(¬tooheavy x ⊃ heaviness x = JUNK)%1,

!!b4:	%2¬isreason JUNK%1,

and

!!b5:	%2∀r x y s.(prevmove(r,x,y,s) ⊃ isreason r)%1.

	We must also be able to ensure that blocks, situations, actions,
the object ⊗JUNK and reasons are mutually disjoint.  This can either be
done by appropriate axioms or more elegantly by using a many sorted logic
in which variables ranging over these classes are declared to be of
appropriate distinct sorts.

	Now suppose that we have blocks ⊗A and ⊗B, and we apply the
circumscription schema to the above axioms and the constants ⊗A, ⊗B, ⊗JUNK 
and an initial situation ⊗S0.  The schema will allow us to prove
%2on(x,y,result(move(x,y,S0))%1, because the minimal model will not
contain any reason why ⊗x cannot be moved onto ⊗y.  In this minimal
model, ⊗heaviness_x_ will have the value ⊗JUNK.  

.bb Co-ordinate systems and propositional calculus circumscription.

	Suppose we want to establish a language in which to express some
class of assertions.  Deciding on predicate and function symbols and their
meanings in order to express these assertions is like choosing a co-ordinate
system for a physical space.  For example, in the blocks world one may
express everything in terms of the relation ⊗on(x,y,s) meaning that block
⊗x is on block ⊗y in situation ⊗s, but one might also use the relation
⊗above(x,y,s) that would be satisfied if there were intervening blocks
in between ⊗x and ⊗y.  The same assertions about blocks can be made in
in either language.  In particular, each can be defined in terms of the
other, and equivalent sets of axioms can be written.
(I don't know of any general study of "logical co-ordinate systems" and
their relations and transformations.  It will certainly be more complex
than the theory of linear transformations of vector spaces).

	Logical co-ordinate systems that are equivalent in terms of what
can be expressed may be inequivalent heuristically.
Conjectures that are natural in one system may seem contrived in others.
In particular,
circumscription yields different conjectures in different co-ordinate
systems.  A propositional calculus
version of circumscription illustrates some of the issues.

	A propositional language for expressing some class
of assertions is established by choosing
propositional letters %2p%41%2,...,p%4n%1 and regarding each of them as
expressing some elementary fact.
We may then express the set ⊗A of known
facts as a sentence ⊗AX in terms of the %2p%1's.  A model of the facts
is any assignment of truth values to the %2p%4i%1's that makes ⊗AX true.

	As a simple example, we may be thinking
of assertions ⊗p, ⊗q, and ⊗r related by %2p_≡_(q_≡_r)%1.
and satisfying %2p_∨_r%1.
One co-ordinate system would take %2p%41%2_=_p%1 and %2p%42%2_=_q%1,
and another would take %2p%41%2'_=_q%1 and %2p%42%2'_=_¬(p_≡_r)%1.
In one system we would have the axiom %2p%41%2

***********

	Let ⊗M1 and ⊗M2 be models of ⊗AX. We define

	%2M1 ≤ M2 ≡  ∀i.(true(p%4i%2,M2) ⊃ true(p%4i%2,M1))%1.

A minimal model is one that is
not a proper extension, i.e. a model in which (roughly speaking) as many
of the %2p%4i%1s as possible are false - consistent with the truth
of %2AX%1.  %2Note that while the models of a set of facts are
independent of the co-ordinate system, the minimal models depend on the
co-ordinate system%1.  Here is the most trivial example.
Let there be one letter ⊗p representing the one fact in a
co-ordinate system ⊗C, and let ⊗A be the null set of facts so that ⊗AX is
the sentence ⊗T (identically true).  The one other co-ordinate system ⊗C' 
uses the elementary sentence %2p'%1 taken equivalent to %2¬p%1.
The minimal model of ⊗C is
α{¬pα}, and the minimal model of ⊗C' is α{pα}.
Since minimal reasoning depends on the co-ordinate system and not merely
on the facts expressed in the axioms, the utility of an axiom system
will depend on whether its minimal models express the uses we wish to
make of Occam's razor.

	A less trivial example of minimal entailment is the following.
The propositional co-ordinate system has basic sentences %2p%41%1, %2p%42%1
and %2p%43%1, and %2AX = p%41%2 ∨ p%42%1.  There are two minimal models.
%2p%43%1 is false in
both, and %2p%41%1 is false in one, and %2p%42%1 is false in the other.
Thus we have %2AX_%8q%4m%2_(¬(p%41%2_∧_p%42%2)_∧_¬p%43%1).

	Note that if ⊗AX is a conjunction of certain elementary sentences
(some of them negated), e.g. %2AX_=_p%42%2_∧_¬p%44%1, then there is a unique
minimal model, but if, as above, %2AX_=_p%42%2_∨_¬p%44%1, then there
is more than one minimal model.

	One may also consider extensions %2relative%1 to a set ⊗B of
elementary sentences.  We have

	%2M1 ≤ M2 (mod B) ≡ ∀p.(p ε B ∧ true(p,M1) ⊃ true(p,M2)%1.

We can then introduce minimal models relative to ⊗B and the corresponding
relative minimal entailment.  Thus we may care about minimality and
Occam's razor only over a certain part of our knowledge.

	Note that we may discover a set of facts one at a time so that we
have an increasing sequence %2A%41%2 ⊂ A%42%1 ⊂ ... of sets of sentences.
A given sentence ⊗p may oscillate in its minimal entailment by the %2A%4i%1,
e.g. we may have %2A%41%8_q%4m%2_p%1, %2A%42%8_q%4m%2_¬p%1, neither may be
entailed by %2A%43%1, etc.  The common sense Occam's razor conclusion
about a sentence ⊗p often behaves this way as our knowledge increases.

	Reasons can be used to reduce propositional circumscription to
domain circumscription.  We introduce for each propositional letter
%2p%4i%1 to be minimized a corresponding predicate %2pr%4i%2(r)%1 and the
axiom

!!g1:	%2p%4i%2 ≡ ∃r.pr%4i%2(r)%1,

which asserts that %2p%4i%1 is true only if there is a reason why.
We also need axioms

!!g2:	%2∀r.¬(pr%4i%2(r) ∧ pr%4j%2(r))%1

for each ⊗i and ⊗j and axioms asserting that reasons ⊗r are distinct from
any other individuals.  A minimal model will then have as few reasons as
possible and hence will minimize the set of true %2p%4i%1s.

.skip to column 1